Now that the DFPlayer would appear to be working well with the MicroMite for me the next logical step is to try and send strings of mp3 files to the player in order to build up sentences. For some projects single mp3 files can be sent to the player in order to state various stages of operation eg one file stating "Alarm System Set". However what I have in mind are projects that require the ability to verbally state a floating point value for example:
"The battery voltage is","twelve","point","six","seven","Volts"
"The barometric pressure is","one","thousand","and","ninety","six","millibar"
or
"The distance is","two","hundred","and","forty","five","centimeters"
I have approached the problem in stages solving each one in order to move onto the next. Broadly the steps that I have solved in order to achieve the objective are:
- Breakdown a floating point value into Sign, Thousands, Hundreds, Tens, Units and Fractions
- Translate the above elements into a 'real words' in order to send them to the mp3 player.
- Sequentially send the mp3 equivalents of the words that represent the number to the DFPlayer.
ideally I was looking to perform speech conversions on number up to +/- 999999.
Program Description
|
The first version of the numerical speech engine is listed at the bottom of this page and as you will see two files are available for download. The only difference between these is that the first includes routines to show the various stages of the conversion process before the files are sequential played on the DFPlayer, where as the second only sends the mp3 sequences to the DFPlayer.
In order to deal with the various stages of conversion several routines are used that broadly fall in line with the steps described above, what follows is a description of the routines that were required to achieve the desired outcome.
process_number(float)
This routine takes a floating point number and translates into sign, thousands, hundreds,tens_units and fractions. For example a float of 4345.76 will result in a "+" sign ,"4" thousand, "3" hundreds, "45" tens_units and a fraction of "76" the dissected results are then used to feed into the next routine in order to generate a usable word sequence.
speech_translate()
This routing is used in order to perform the word translation for the above number elements, in reality this routine calls several other routines as shown below, the final subroutine add_to_speech_array() being used to deposit the equivalent mp3 file names for each word required into a 20 element array.
speech_translate() -> number_translate() -> tens_units() -> add_to_speech_array()
hundreds() -> add_to_speech_array()
This then builds up a file name array that represents the of the initial floating point value.
play_speech()
This subroutine runs through the array playing each mp3 file, waiting for the file playback to end and then moving on to the next. Eventually it should be possible to have this playback managed by a time triggered interrupt routine, however for now this is simply done by the main program.
clr_speech_array()
Finally this subroutine is used to clear down the 20 element array, a null value is set to 256 as the file range of the DFPlayer is 1 to 255.
|
|
SD File Format
|
The files must be stored in the '01' folder used in the DFPlayer project,this folder in zipped from is included in the links at the bottom of this page. The minimum files required to make the system work are as follows:
As the maximum number of file within this directory is 255 there are plenty of spares for future expansion that could be used to hold the additional sentences mentioned above such as "the battery voltage is" or the units of measurement for the project i.e. "millimeter" or "millibar".
The following website has been used to create the text to speech file for the project.
http://www.fromtexttospeech.com/
for the attached file below the following settings have been used
- British English
- Emma
- Very Fast
This website has been used instead of the https://ttsmp3.com/ site, used in the original DFPlayer Project, only because additional options are included that allow faster tempo word phrases to be generated and so produce an overall faster read back of the initial floating point number.
Example Floating Point Conversion
|
Some examples of the V1_00A program are given below, for these example programs pin 2 of the MicroMite28 has been used for the busy playback of the DFPlayer module.
Once the program is downloaded to the MicroMite device and the correct file structure is added to the DFPlayers microSD card, then once ran the user is asked to input a number between +/- 9999999. This is then processed into the speech number which is the outcome of the process_number(float) sub routine indicating the numbers sign, thousands, hundreds, tens_units and fraction values.
Following this the result of the speech array is listed and by making reference to the mp3 list above, hopefully this array contents makes scene.
Then finally for visual feedback the array is translated into the equivalent words so that it can be visually checked without having to make use of the above table.
As stated above both programs V1_00A and V1_00B operate in the same manner but V1_00B provides no visual feedback and is therefore intended as the basis for future projects.
Example #1
what is the number
? 654321
input to NSE : 654321
speech number: + 654,321.
speech array: 256 256 256 256 256 256 256 256 256 6 21 23 50 4 22 3 21 23 20 1
speech: six hundred and fifty four thousand three hundred and twenty one
Example #2
what is the number
? -0.76234
input to NSE : -0.76234
speech number: - ,0.76234
speech array: 256 256 256 256 256 256 256 256 256 256 256 256 25 0 24 7 6 2 3 4
speech: minus zero point seven six two three four
|
|
|
|
The Next Step
|
The next step is to use this system in a practical project and I am currently working on one or two ideas.
In the future I am also considering increasing the dimensions of the playback array to 2x20 so that the playback folder can also be specified in order to increase the files available for playback and to make the playback element of the program a time triggered interrupt so that the main program can continue to operate instead of waiting for each mp3 to complete its playback.
|
|